Lab Week 5 Objectives:
Required packages:
library(tidyverse)
## ── Attaching packages ──────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.5.2
## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tseries)
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.2
energy <- read_csv("energy.csv")
## Parsed with column specification:
## cols(
## month = col_character(),
## res_total = col_double(),
## ind_total = col_double()
## )
res_ts <- ts(energy$res_total, frequency = 12, start = c(1973,1))
# res_ts
Plot those…
plot(res_ts)
For each, we should ask ourselves: - Is there a trend? - Do data look additive or multiplicative? - Is there seasonality? - Are there notable outliers?
res_dc <- decompose(res_ts)
plot(res_dc)
# Changes within each month over years recorded:
monthplot(res_ts)
ggseasonplot(res_ts) +
theme_bw()
Using ma function in forecast package
# Have them see what happens when they change the moving window...
sma_res <- ma(res_ts, order = 5)
plot(res_ts)
lines(sma_res, col = "red")
# Basic way:
res_acf <- acf(res_ts)
# More information:
ggtsdisplay(res_ts)
Not surprising: strong seasonality is dominant. There appears to be some trend. It looks relatively additive. Can we test for stationarity?
Hypothesis test: null is that the data are NOT stationary. If p < 0.05, we reject the null hypothesis and retain the alternative hypothesis that the data ARE stationary.
adf_res <- adf.test(res_ts) # Yes, stationary
## Warning in adf.test(res_ts): p-value smaller than printed p-value
adf_res # p-value = 0.01
##
## Augmented Dickey-Fuller Test
##
## data: res_ts
## Dickey-Fuller = -11.342, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
# Exponential smoothing: no normality assumption (unbiased)
# Perform Holt Winters
res_hw <- HoltWinters(res_ts) # See smoothing parameters with res_hw
plot(res_hw)
# Then forecast
res_forecast <- forecast(res_hw, h = 60)
plot(res_forecast)
Then check the residuals:
hist(res_forecast$residuals) # Look normally distributed.
res_pdq <- auto.arima(res_ts) # [1,0,1][0,1,1]
res_pdq
## Series: res_ts
## ARIMA(1,0,2)(0,1,1)[12] with drift
##
## Coefficients:
## ar1 ma1 ma2 sma1 drift
## 0.990 -0.4821 -0.4196 -0.8128 0.7520
## s.e. 0.011 0.0407 0.0397 0.0253 0.5872
##
## sigma^2 estimated as 7326: log likelihood=-3090.52
## AIC=6193.05 AICc=6193.21 BIC=6218.64
res_arima <- arima(res_ts, order = c(1,0,1), seasonal = list(order = c(0,1,1)))
par(mfrow = c(1,2))
hist(res_arima$residuals)
qqnorm(res_arima$residuals)
forecast_res <- forecast(res_arima, h = 72)
plot(forecast_res)
res_df <- data.frame(forecast_res)
month_seq <- seq(1,72)
res_df_2 <- data.frame(month_seq, res_df) # View this data frame...
ggplot(res_df_2, aes(x = month_seq, y = Point.Forecast)) +
geom_line() +
geom_ribbon(aes(ymin = Lo.95, ymax = Hi.95, alpha = 0.2)) +
theme_minimal()